In [1]:
import musicntd.scripts.hide_code as hide
C:\Users\amarmore\AppData\Local\Continuum\anaconda3\envs\NTD_segmentation\lib\site-packages\librosa\util\decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit
C:\Users\amarmore\AppData\Local\Continuum\anaconda3\envs\NTD_segmentation\lib\site-packages\librosa\util\decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
  from numba.decorators import jit as optional_jit

From padding to subdivision

As evoked in the 1st notebook, in previous experiments, every bar of the tensor was zero-padded if it was shorter than the longest bar of the song.

This fix is not satisfactory, as it creates null artifacts at the end of most of the slices of the tensor.

Description of the subdivision method

Instead, we decided to over-sample the chromagram (32-sample hop) and then select the same number of frames in each bar. This way, rather than having equally spaced frames in all bars of the tensor which resulted in slices of the tensor of inequal sizes (before padding), it now computes bar-chromagrams of the same number of frames, which is a parameter to be set. In each bar-chromagram, frames are almost* equally spaced, but the gap between two consecutive frames in two different bars can now be different.

We call subdivision of bars the number of frames we select in each bar. This parameter is to be set, and we will try to evaluate a good parameter in the next part of this notebook.

Concretely, let's consider the chromagram of a particular bar, starting at time $t_0$ and ending at time $t_1$. This chromagram contains $n = (t_1 - t_0 + 1) * \frac{s_r}{32}$ frames, with $s_r$ the sampling rate. In this chromagram, given a subdivision $sub$, we will select frame at indexes $\{k * \frac{n}{sub}$ for $k \in [0, sub[$ and $k$ integer $\}$. As indexes need to be integers, we need to round the precedent expression.

*almost, because of the rounding operation presented above

Setting the subdivision parameter

We will test three values for the subdivision parameter:

  • 96 (24 beats per bar),
  • 128 (32 beats per bar),
  • 192 (48 beats per bar).

We will test the segmentation on the entire RWC Popular dataset, with MIREX10 annotations, and by testing several ranks (16,24,32,40) for $H$ and $Q$.

Note that, due to the conclusion in Notebook 2, we now have fixed $W$ to the 12-size identity matrix.

In [2]:
# On définit le type d'annotations
annotations_type = "MIREX10"
ranks_rhythm = [16,24,32,40]
ranks_pattern = [16,24,32,40]

Subdivision 96

Fixed ranks

Below are segmentation results with the subdivision fixed to 96, for the different ranks values, on the RWC Pop dataset.

Results are computed with tolerance of respectively 0.5 seconds and 3 seconds.

In [3]:
zero_five_nine, three_nine = hide.compute_ranks_RWC(ranks_rhythm,ranks_pattern, W = "chromas", annotations_type = annotations_type,
                                                  subdivision=96, penalty_weight = 1)
c:\users\amarmore\desktop\projects\phd main projects\on git\code\tensor factorization\musicntd\autosimilarity_segmentation.py:43: RuntimeWarning: invalid value encountered in true_divide
  this_array = np.array([list(i/np.linalg.norm(i)) for i in this_array.T]).T
Résultats à 0.5 secondes Vrai Positifs Faux Positifs Faux Négatifs Precision Rappel F mesure
Rang Q:16 Rang H:16 8.4700 5.7900 10.3400 0.5977 0.4566 0.5092
Rang H:24 8.8400 5.5000 9.9700 0.6171 0.4780 0.5312
Rang H:32 8.7100 5.6300 10.1000 0.6136 0.4707 0.5247
Rang H:40 8.9400 5.6600 9.8700 0.6164 0.4827 0.5336
Rang Q:24 Rang H:16 9.0400 5.9100 9.7700 0.6063 0.4889 0.5339
Rang H:24 9.6700 5.9700 9.1400 0.6246 0.5213 0.5601
Rang H:32 9.7000 5.8800 9.1100 0.6270 0.5232 0.5632
Rang H:40 9.5800 6.2300 9.2300 0.6103 0.5173 0.5530
Rang Q:32 Rang H:16 9.8100 6.2300 9.0000 0.6180 0.5275 0.5618
Rang H:24 9.8400 6.5100 8.9700 0.6023 0.5252 0.5535
Rang H:32 10.1100 6.0400 8.7000 0.6266 0.5450 0.5767
Rang H:40 9.8400 6.2800 8.9700 0.6117 0.5274 0.5594
Rang Q:40 Rang H:16 9.2800 6.7700 9.5300 0.5776 0.4984 0.5270
Rang H:24 9.7000 6.6900 9.1100 0.5907 0.5180 0.5450
Rang H:32 9.7200 6.9000 9.0900 0.5856 0.5217 0.5452
Rang H:40 10.0000 6.5700 8.8100 0.6063 0.5361 0.5621
Résultats à 3 secondes Vrai Positifs Faux Positifs Faux Négatifs Precision Rappel F mesure
Rang Q:16 Rang H:16 10.8000 3.4600 8.0100 0.7712 0.5812 0.6522
Rang H:24 10.9900 3.3500 7.8200 0.7787 0.5938 0.6641
Rang H:32 10.8300 3.5100 7.9800 0.7676 0.5818 0.6515
Rang H:40 11.0200 3.5800 7.7900 0.7672 0.5932 0.6593
Rang Q:24 Rang H:16 11.3000 3.6500 7.5100 0.7642 0.6100 0.6690
Rang H:24 11.7500 3.8900 7.0600 0.7641 0.6316 0.6813
Rang H:32 11.6900 3.8900 7.1200 0.7590 0.6282 0.6785
Rang H:40 11.7300 4.0800 7.0800 0.7510 0.6316 0.6775
Rang Q:32 Rang H:16 11.8600 4.1800 6.9500 0.7465 0.6363 0.6780
Rang H:24 11.9200 4.4300 6.8900 0.7357 0.6393 0.6746
Rang H:32 12.1500 4.0000 6.6600 0.7559 0.6534 0.6931
Rang H:40 12.1400 3.9800 6.6700 0.7604 0.6527 0.6938
Rang Q:40 Rang H:16 11.8300 4.2200 6.9800 0.7422 0.6352 0.6740
Rang H:24 12.1600 4.2300 6.6500 0.7475 0.6515 0.6869
Rang H:32 12.1000 4.5200 6.7100 0.7349 0.6489 0.6804
Rang H:40 12.3300 4.2400 6.4800 0.7505 0.6613 0.6945

Oracle ranks

In this condition, we only keep the ranks leading to the highest F measure.

In that sense, it's an optimistic upper bound on metrics.

In [4]:
hide.printmd("**A 0.5 secondes:**")
best_chr_zero_five = hide.best_f_one_score_rank(zero_five_nine)
hide.printmd("**A 3 secondes:**")
best_chr_three = hide.best_f_one_score_rank(three_nine)

A 0.5 secondes:

Vrai Positifs Faux Positifs Faux Négatifs Precision Rappel F mesure
En optimisant la F mesure sur chaque chanson: 11.8 4.48 7.01 0.7305 0.6334 0.6716

A 3 secondes:

Vrai Positifs Faux Positifs Faux Négatifs Precision Rappel F mesure
En optimisant la F mesure sur chaque chanson: 13.53 2.69 5.28 0.8421 0.7258 0.7729

Below is presented the distribution of the optimal ranks in the "oracle ranks" condition, i.e. the distribution of the ranks for $H$ and $Q$ which result in the highest F measure for the different songs.

In [5]:
hide.plot_3d_ranks_study(zero_five_nine, ranks_rhythm, ranks_pattern)

Below is shown the distribution histogram of the F measure obtained with the oracle ranks.

In [6]:
hide.plot_f_mes_histogram(zero_five_nine)

Finally, here are displayed the 5 worst songs in term of F measure in this condition.

In [7]:
hide.return_worst_songs(zero_five_nine, 5)
Out[7]:
[('77.wav', 0.2857),
 ('51.wav', 0.359),
 ('98.wav', 0.4),
 ('63.wav', 0.4242),
 ('50.wav', 0.4324)]

Subdivision 128

Fixed ranks

Below are segmentation results with the subdivision fixed to 128, for the different ranks values, on the RWC Pop dataset.

Results are computed with tolerance of respectively 0.5 seconds and 3 seconds.

In [8]:
zero_five_cent, three_cent = hide.compute_ranks_RWC(ranks_rhythm,ranks_pattern, W = "chromas", annotations_type = annotations_type,
                                                  subdivision=128, penalty_weight = 1)
c:\users\amarmore\desktop\projects\phd main projects\on git\code\tensor factorization\musicntd\autosimilarity_segmentation.py:43: RuntimeWarning: invalid value encountered in true_divide
  this_array = np.array([list(i/np.linalg.norm(i)) for i in this_array.T]).T
Résultats à 0.5 secondes Vrai Positifs Faux Positifs Faux Négatifs Precision Rappel F mesure
Rang Q:16 Rang H:16 8.5900 5.6000 10.2200 0.6063 0.4638 0.5183
Rang H:24 8.6600 5.7600 10.1500 0.6006 0.4676 0.5182
Rang H:32 8.9800 5.3700 9.8300 0.6293 0.4837 0.5384
Rang H:40 8.9800 5.4700 9.8300 0.6230 0.4849 0.5376
Rang Q:24 Rang H:16 9.3900 5.8700 9.4200 0.6156 0.5069 0.5477
Rang H:24 9.4000 6.1300 9.4100 0.6082 0.5076 0.5459
Rang H:32 9.3900 6.1500 9.4200 0.6053 0.5055 0.5439
Rang H:40 9.5100 6.1100 9.3000 0.6096 0.5123 0.5500
Rang Q:32 Rang H:16 9.6600 6.2600 9.1500 0.6075 0.5202 0.5526
Rang H:24 10.1100 6.0100 8.7000 0.6257 0.5425 0.5747
Rang H:32 9.8800 6.4000 8.9300 0.6072 0.5317 0.5599
Rang H:40 9.9500 6.3800 8.8600 0.6092 0.5352 0.5627
Rang Q:40 Rang H:16 9.5400 6.3300 9.2700 0.6008 0.5121 0.5450
Rang H:24 9.5700 6.6700 9.2400 0.5857 0.5134 0.5401
Rang H:32 9.8400 6.3600 8.9700 0.6073 0.5280 0.5559
Rang H:40 9.8200 6.9900 8.9900 0.5878 0.5231 0.5455
Résultats à 3 secondes Vrai Positifs Faux Positifs Faux Négatifs Precision Rappel F mesure
Rang Q:16 Rang H:16 10.7800 3.4100 8.0300 0.7715 0.5816 0.6535
Rang H:24 10.9300 3.4900 7.8800 0.7697 0.5891 0.6575
Rang H:32 10.8600 3.4900 7.9500 0.7687 0.5845 0.6532
Rang H:40 11.0100 3.4400 7.8000 0.7719 0.5940 0.6618
Rang Q:24 Rang H:16 11.5500 3.7100 7.2600 0.7648 0.6227 0.6762
Rang H:24 11.5200 4.0100 7.2900 0.7502 0.6195 0.6690
Rang H:32 11.5300 4.0100 7.2800 0.7463 0.6194 0.6682
Rang H:40 11.7800 3.8400 7.0300 0.7629 0.6330 0.6831
Rang Q:32 Rang H:16 11.8100 4.1100 7.0000 0.7485 0.6342 0.6764
Rang H:24 11.9200 4.2000 6.8900 0.7436 0.6390 0.6790
Rang H:32 12.1100 4.1700 6.7000 0.7512 0.6496 0.6876
Rang H:40 12.1000 4.2300 6.7100 0.7476 0.6491 0.6858
Rang Q:40 Rang H:16 11.8500 4.0200 6.9600 0.7523 0.6380 0.6803
Rang H:24 11.9800 4.2600 6.8300 0.7421 0.6430 0.6795
Rang H:32 11.9700 4.2300 6.8400 0.7452 0.6432 0.6792
Rang H:40 12.1300 4.6800 6.6800 0.7312 0.6501 0.6783

Oracle ranks

In this condition, we only keep the ranks leading to the highest F measure.

In that sense, it's an optimistic upper bound.

In [9]:
hide.printmd("**A 0.5 secondes:**")
best_chr_zero_five = hide.best_f_one_score_rank(zero_five_cent)
hide.printmd("**A 3 secondes:**")
best_chr_three = hide.best_f_one_score_rank(three_cent)

A 0.5 secondes:

Vrai Positifs Faux Positifs Faux Négatifs Precision Rappel F mesure
En optimisant la F mesure sur chaque chanson: 11.68 4.55 7.13 0.7253 0.6272 0.6646

A 3 secondes:

Vrai Positifs Faux Positifs Faux Négatifs Precision Rappel F mesure
En optimisant la F mesure sur chaque chanson: 13.43 2.7 5.38 0.839 0.7218 0.7676

Below is presented the distribution of the optimal ranks in the "oracle ranks" condition, i.e. the distribution of the ranks for $H$ and $Q$ which result in the highest F measure for the different songs.

In [10]:
hide.plot_3d_ranks_study(zero_five_cent, ranks_rhythm, ranks_pattern)

Below is shown the distribution histogram of the F measure obtained with the oracle ranks.

In [11]:
hide.plot_f_mes_histogram(zero_five_cent)

Finally, here are displayed the 5 worst songs in term of F measure in this condition.

In [12]:
hide.return_worst_songs(zero_five_cent, 5)
Out[12]:
[('77.wav', 0.2857),
 ('51.wav', 0.359),
 ('71.wav', 0.2778),
 ('63.wav', 0.375),
 ('50.wav', 0.3784)]

Subdivision 192

Fixed ranks

Below are segmentation results with the subdivision fixed to 192, for the different ranks values, on the RWC Pop dataset.

Results are computed with tolerance of respectively 0.5 seconds and 3 seconds.

In [13]:
zero_five_hunnine, three_hunnine = hide.compute_ranks_RWC(ranks_rhythm,ranks_pattern, W = "chromas", annotations_type = annotations_type,
                                                  subdivision=192, penalty_weight = 1)
c:\users\amarmore\desktop\projects\phd main projects\on git\code\tensor factorization\musicntd\autosimilarity_segmentation.py:43: RuntimeWarning: invalid value encountered in true_divide
  this_array = np.array([list(i/np.linalg.norm(i)) for i in this_array.T]).T
Résultats à 0.5 secondes Vrai Positifs Faux Positifs Faux Négatifs Precision Rappel F mesure
Rang Q:16 Rang H:16 8.5000 5.6200 10.3100 0.6042 0.4599 0.5138
Rang H:24 8.6800 5.7400 10.1300 0.6037 0.4691 0.5196
Rang H:32 8.8100 5.5400 10.0000 0.6141 0.4744 0.5267
Rang H:40 8.9000 5.7100 9.9100 0.6086 0.4791 0.5278
Rang Q:24 Rang H:16 9.4600 5.8400 9.3500 0.6224 0.5109 0.5534
Rang H:24 9.3200 5.9800 9.4900 0.6094 0.5027 0.5437
Rang H:32 9.5500 6.1700 9.2600 0.6121 0.5154 0.5524
Rang H:40 9.4900 6.1300 9.3200 0.6088 0.5119 0.5498
Rang Q:32 Rang H:16 9.6300 6.1200 9.1800 0.6099 0.5190 0.5540
Rang H:24 9.8100 6.4500 9.0000 0.6090 0.5259 0.5565
Rang H:32 9.9700 6.3300 8.8400 0.6077 0.5346 0.5617
Rang H:40 9.7800 6.3800 9.0300 0.6030 0.5270 0.5559
Rang Q:40 Rang H:16 9.7400 6.3500 9.0700 0.6048 0.5214 0.5528
Rang H:24 9.8700 6.5100 8.9400 0.6058 0.5269 0.5560
Rang H:32 9.6700 6.5600 9.1400 0.5944 0.5215 0.5481
Rang H:40 9.8700 6.8800 8.9400 0.5892 0.5313 0.5515
Résultats à 3 secondes Vrai Positifs Faux Positifs Faux Négatifs Precision Rappel F mesure
Rang Q:16 Rang H:16 10.6800 3.4400 8.1300 0.7712 0.5767 0.6494
Rang H:24 10.9600 3.4600 7.8500 0.7729 0.5918 0.6598
Rang H:32 10.8700 3.4800 7.9400 0.7671 0.5843 0.6525
Rang H:40 11.0300 3.5800 7.7800 0.7660 0.5947 0.6592
Rang Q:24 Rang H:16 11.6800 3.6200 7.1300 0.7742 0.6298 0.6848
Rang H:24 11.6500 3.6500 7.1600 0.7655 0.6264 0.6797
Rang H:32 11.6600 4.0600 7.1500 0.7499 0.6268 0.6736
Rang H:40 11.8400 3.7800 6.9700 0.7666 0.6379 0.6881
Rang Q:32 Rang H:16 11.6900 4.0600 7.1200 0.7507 0.6274 0.6741
Rang H:24 11.8400 4.4200 6.9700 0.7379 0.6344 0.6725
Rang H:32 12.2300 4.0700 6.5800 0.7549 0.6538 0.6910
Rang H:40 12.0200 4.1400 6.7900 0.7454 0.6460 0.6835
Rang Q:40 Rang H:16 12.0700 4.0200 6.7400 0.7564 0.6467 0.6877
Rang H:24 12.0800 4.3000 6.7300 0.7454 0.6463 0.6828
Rang H:32 11.9400 4.2900 6.8700 0.7384 0.6426 0.6778
Rang H:40 12.1800 4.5700 6.6300 0.7343 0.6548 0.6823

Oracle ranks

In this condition, we only keep the ranks leading to the highest F measure.

In that sense, it's an optimistic upper bound.

In [14]:
hide.printmd("**A 0.5 secondes:**")
best_chr_zero_five = hide.best_f_one_score_rank(zero_five_hunnine)
hide.printmd("**A 3 secondes:**")
best_chr_three = hide.best_f_one_score_rank(three_hunnine)

A 0.5 secondes:

Vrai Positifs Faux Positifs Faux Négatifs Precision Rappel F mesure
En optimisant la F mesure sur chaque chanson: 11.67 4.5 7.14 0.727 0.6267 0.6656

A 3 secondes:

Vrai Positifs Faux Positifs Faux Négatifs Precision Rappel F mesure
En optimisant la F mesure sur chaque chanson: 13.5 2.64 5.31 0.8451 0.725 0.7721

Below is presented the distribution of the optimal ranks in the "oracle ranks" condition, i.e. the distribution of the ranks for $H$ and $Q$ which result in the highest F measure for the different songs.

In [15]:
hide.plot_3d_ranks_study(zero_five_hunnine, ranks_rhythm, ranks_pattern)

Below is shown the distribution histogram of the F measure obtained with the oracle ranks.

In [16]:
hide.plot_f_mes_histogram(zero_five_hunnine)

Finally, here are displayed the 5 worst songs in term of F measure in this condition.

In [17]:
hide.return_worst_songs(zero_five_hunnine, 5)
Out[17]:
[('51.wav', 0.3333),
 ('77.wav', 0.2857),
 ('71.wav', 0.2857),
 ('63.wav', 0.4118),
 ('34.wav', 0.4375)]

Conclusion

We didn't find the difference in the segmentation results to be significative.

In that sense, we concluded that the three tested subdivisions were equally satisfying for our experiments, and we decided to pursue with the 96 subdivision only, in order to reduce computation time and complexity, as it is the smallest tested value.

96 also presents the advantage (compared to 128) to be divisible by 3 and 4, which are the most common number of beats per bar in western pop music (even if, for now, we have restricted our study to music with 4 beats per bar).